Distributional Consistency: As a General Method for Defining a Core Lexicon

نویسندگان

  • Huarui Zhang
  • Chu-Ren Huang
  • Shiwen Yu
چکیده

We propose Distributional Consistency (DC) as a general method for defining a Core Lexicon. The property of DC is investigated theoretically and empirically, showing that it is clearly distinguishable from word frequency and range of distribution. DC is also shown to reflect intuitive interpretations, especially when its value is close to 1. Its immediate application in NLP would include defining a core lexicon in a language and identifying topical words in a document. We also categorize the existent measures of dispersion into 3 groups via ratio of norm or entropy, proposed a simplified measure and a combined kind of measure. These new measures can be used as virtual prototype or medium type for the study and comparison of existent measures in the future.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Distributional Semantics and the Lexicon

The lexicons used in computational linguistics systems contain morphological, syntactic, and occasionally also some semantic information (such as definitions, pointers to an ontology, verb frame filler preferences, etc.). But the human cognitive lexicon contains a great deal more, crucially, expectations about how a word tends to combine with others: not just general information-extraction-like...

متن کامل

Imam Sadegh’s (AS) Hadiths in Sunni’s lexicon

The Quran and Hadiths including Infallibles (AS) Hadiths such as Imam Sadegh (AS) were one of compilation references, and also, one of the fields of research for Arabs morphologists from long time ago. Imam Sadegh’s (AS) Hadiths based on Sunni’s lexicon, and then, based on another Islamic science books will be illustrated in this research in order to identify where these Hadiths hav...

متن کامل

A Bayesian Framework for Learning Words From Multiword Utterances

Current computational models of word learning make use of correspondences between words and observed referents, but as of yet cannot—as human learners do—leverage information regarding the meaning of other words in the lexicon. Here we develop a Bayesian framework for word learning that learns a lexicon from multiword utterances. In a set of three simulations we demonstrate this framework’s fun...

متن کامل

A New Semantics: Merging Propositional and Distributional Information

Despite hundreds of years of study on semantics, theories and representations of semantic content—the actual meaning of the symbols used in semantic propositions—remain impoverished. The traditional extensional and intensional models of semantics are difficult to actually flesh out in practice, and no large-scale models of this kind exist. Recently, researchers in Natural Language Processing (N...

متن کامل

Implementing a Reverse Dictionary, based on word definitions, using a Node-Graph Architecture

In this paper, we outline an approach to build graph-based reverse dictionaries using word definitions. A reverse dictionary takes a phrase as an input and outputs a list of words semantically similar to that phrase. It is a solution to the Tip-of-the-Tongue problem. We use a distance-based similarity measure, computed on a graph, to assess the similarity between a word and the input phrase. We...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2004